83 research outputs found

    Compositional data for global monitoring: the case of drinking water and sanitation

    Get PDF
    Introduction At a global level, access to safe drinking water and sanitation has been monitored by the Joint Monitoring Programme (JMP) of WHO and UNICEF. The methods employed are based on analysis of data from household surveys and linear regression modelling of these results over time. However, there is evidence of non-linearity in the JMP data. In addition, the compositional nature of these data is not taken into consideration. This article seeks to address these two previous shortcomings in order to produce more accurate estimates. Methods We employed an isometric log-ratio transformation designed for compositional data. We applied linear and non-linear time regressions to both the original and the transformed data. Specifically, different modelling alternatives for non-linear trajectories were analysed, all of which are based on a generalized additive model (GAM). Results and discussion Non-linear methods, such as GAM, may be used for modelling non-linear trajectories in the JMP data. This projection method is particularly suited for data-rich countries. Moreover, the ilr transformation of compositional data is conceptually sound and fairly simple to implement. It helps improve the performance of both linear and non-linear regression models, specifically in the occurrence of extreme data points, i.e. when coverage rates are near either 0% or 100%.Peer ReviewedPostprint (author's final draft

    Estimación bayesiana de cópulas extremales en procesos de Poisson

    Get PDF
    The estimation of occurrence probabilities of extremal quantities is essential in the study of hazards associated with natural phenomena. The extremal quantities of interest usually correspond to phenomena characterized by two or more magnitudes, often showing dependence among them. In order to better characterize situations that could be dangerous, the magnitudes that describe the phenomenon should be jointly described. A Poisson-GPD model, which describes the occurrence of extremal events and their marginal sizes, has been established: the occurrence of the extremal events is represented by means of a Poisson process, and each event is characterized by a size modelled by a Generalized Pareto Distribution, GPD. The dependence between events is modelled through copula functions: a family of Gumbel copulas, suitable for the type of data treated, and a new type of copula that is introduced, the CrEnC copula. The CrEnC copula minimizes the mutual information in situations in which only partial information in the form of restrictions is available, such as marginal models or joint moments of the variables. In this context, data are often scarce, and the uncertainty in the estimation of the model will be great. A Bayesian estimation process that takes into account this uncertainty has been established. Goodness-of-fit of some aspects of the model (GPD goodness-of-fit, GPD-Weibull hypothesis and global goodness-of-fit) has been checked using a selection of Bayesian p-values, which incorporate the uncertainty of the parameter estimation. Once the model has been estimated, a post-process of information has been performed to obtain a posteriori quantities of interest, such as exceedance probabilities of reference values or return periods of events of a certain size. The proposed model is applied to three datasets, with different characteristics. The results obtained are good: the introduced CrEnC copulas correctly represent the dependence in situations in which only partial information is available, and the Bayesian estimation of the parameters of the model gives added value to the results, because it allows the uncertainty of the posterior estimates, such as hazard and dependence parameters, to be evaluated.La estimación de probabilidades de ocurrencia de cantidades extremales es imprescindible en el estudio de la peligrosidad de fenómenos naturales. Las cantidades extremales de interés suelen corresponder a fenómenos caracterizados por dos o más magnitudes, que en muchos casos son dependientes entre sí. Por tanto, para poder caracterizar mejor las situaciones que pudieran resultar peligrosas, se deben estudiar conjuntamente las magnitudes que describen el fenómeno. Se ha establecido un modelo Poisson-GPD que permite describir la ocurrencia de los sucesos extremales y sus tamaños marginales: la ocurrencia de los sucesos extremales se representa mediante un proceso de Poisson y cada suceso se caracteriza por un tamaño modelado según una distribución generalizada de Pareto, GPD. La dependencia entre sucesos se modeliza mediante funciones cópula: se utiliza una familia de cópulas Gumbel, adecuada al tipo de datos, y se introduce un nuevo tipo de cópula, la cópula CrEnC. La cópula CrEnC minimiza la información mutua en situaciones donde se dispone de información parcial en forma de restricciones, tales como los modelos marginales o momentos conjuntos de las variables. La representación de estas cópulas en R^2 permite mejorar tanto su estima como la apreciación de la bondad de ajuste a los datos. Se proporciona un algoritmo de estimación de cópulas CrEnC, que incluye una aproximación de las funciones normalizadoras mediante el método Montecarlo. En este contexto los datos suelen ser escasos, por lo que la incertidumbre en la estimación del modelo será elevada. Se ha establecido un proceso de estimación bayesiana de los parámetros, la cual permite tener en cuenta esta incertidumbre. La bondad de ajuste de diversos aspectos del modelo (bondad de ajuste GPD, hipótesis GPD-Weibull y bondad de ajuste global) se ha valorado mediante una selección de p-valores bayesianos, los cuales incorporan la incertidumbre de la estimación de los parámetros. Una vez estimado el modelo, se realiza un post-proceso de la información, donde se obtienen cantidades a posteriori de interés, como probabilidades de excedencia de valores de referencia o periodos de retorno de sucesos de un tamaño determinado. El modelo propuesto se aplica a tres conjuntos de datos de características diferentes. Se obtienen buenos resultados: las cópulas CrEnC introducidas representan correctamente la dependencia en situaciones en las que sólo se dispone de información parcial y la estimación bayesiana de los parámetros del modelo proporciona valor añadido a los resultados, ya que permite evaluar la incertidumbre de las estimaciones y tenerla en cuenta al obtener las cantidades a posteriori deseada

    Bayesian estimation of the orthogonal decomposition of a contingency table

    Get PDF
    In a multinomial sampling, contingency tables can be parametrized by probabilities of each cell. These probabilities constitute the joint probability function of two or more discrete random variables. These probability tables have been previously studied from a compositional point of view. The compositional analysis of probability tables ensures coherence when analysing sub-tables. The main results are: (1) given a probability table, the closest independent probability table is the product of their geometric marginals; (2) the probability table can be orthogonally decomposed into an independent table and an interaction table; (3) the departure of independence can be measured using simplicial deviance, which is the Aitchison square norm of the interaction table. In previous works, the analysis has been performed from a frequentist point of view. This contribution is aimed at providing a Bayesian assessment of the decomposition. The resulting model is a log-linear one, which parameters are the centered log-ratio transformations of the geometric marginals and the interaction table. Using a Dirichlet prior distribution of multinomial probabilities, the posterior distribution of multinomial probabilities is again a Dirichlet distribution. Simulation of this posterior allows to study the distribution of marginal and interaction parameters, checking the independence of the observed contingency table and cell interactions. The results corresponding to a two-way contingency table example are presented.Peer ReviewedPostprint (published version

    Characterizing the evolution of the container traffic share in the mediterranean sea using hierarchical clustering

    Get PDF
    This research investigates the traffic share evolution of the container throughput in the Mediterranean ports from 2000 to 2015 considering hierarchical clustering and concentration indexes. Compositional Data analysis techniques are used to illustrate periods with similar traffic share composition. Two different regions (East and West) in the Mediterranean Sea (Med) are selected in the function of the long haul services. The standard concentration indexes (i.e., concentration ratio, Gini coefficient, and Normalized Herfindahl-Hirschman) reveal a gentle decreasing of the concentration with relevant fluctuations mainly in the East region. This is due to the investment in port infrastructure in the area resulting from privatization initiatives in many Eastern Mediterranean countries. The periods obtained from the hierarchical clustering show a differentiated pattern in traffic share composition. For these periods, the shift-share results are consistent with traffic fluctuations and in line with the evolution of the concentration indexes. The combination of methods has allowed a good interpretation of the spatial and temporal evolution of the Med ports’ traffic being the methodology applicable elsewhere in the context of port system analysis.Peer ReviewedPostprint (published version

    Pluviometric regionalization of Catalunya: a compositional data methodology

    Get PDF
    The aim of this paper is to introduce a methodology for de¯ning groups from regionalized com- positional data, through a hierarchical clustering algorithm aware of both the spatial dependence and the compositional character of the data set. This method is used to de¯ne a regionalization of Catalunya (NE Spain) with respect to its precipitation patterns in the Winter season. This region is characterized by a highly contrasted topography, which plays a dominant role in the spatial distribution of precipitation. Each rain gauge station is characterized by the relative frequencies of occurrence of six intervals of daily precipitation amount (classes ranging from \no rain" for precipitation below 3 mm, to \heavy storm" above 50 mm). Recognizing that frequencies are com-positional data, the spatial dependence of this data set has been characterized by variograms of the set of all pair-wise log-ratios, in the fashion of the variation matrix. Then, a Mahalanobis distance between stations has been de¯ned using these variograms to ensure that gauges with high spatial correlation get smaller distances. This spatially-dependent distance criterion has been used in a Ward hierarhical cluster method to de¯ne the regions. Results reveal 5 quite homogeneous groups of stations, which can be mostly ascribed a physical meaning. Finally, possible links to regional circulation patterns are discussed.Postprint (published version

    Pluviometric regionalization of Catalunya: a compositional data methodology

    Get PDF
    The aim of this paper is to introduce a methodology for de¯ning groups from regionalized com- positional data, through a hierarchical clustering algorithm aware of both the spatial dependence and the compositional character of the data set. This method is used to de¯ne a regionalization of Catalunya (NE Spain) with respect to its precipitation patterns in the Winter season. This region is characterized by a highly contrasted topography, which plays a dominant role in the spatial distribution of precipitation. Each rain gauge station is characterized by the relative frequencies of occurrence of six intervals of daily precipitation amount (classes ranging from \no rain" for precipitation below 3 mm, to \heavy storm" above 50 mm). Recognizing that frequencies are com- positional data, the spatial dependence of this data set has been characterized by variograms of the set of all pair-wise log-ratios, in the fashion of the variation matrix. Then, a Mahalanobis distance between stations has been de¯ned using these variograms to ensure that gauges with high spatial correlation get smaller distances. This spatially-dependent distance criterion has been used in a Ward hierarhical cluster method to de¯ne the regions. Results reveal 5 quite homogeneous groups of stations, which can be mostly ascribed a physical meaning. Finally, possible links to regional circulation patterns are discussed

    Compositional data techniques for the analysis of the container traffic share in a multi-port region

    Get PDF
    The statistical techniques based on compositional data are applied to investigate the evolution of the traffic share of the container throughput in a multi-port system. Compositional vectors are those which contain relative information of parts of some whole. The application of conventional statistical techniques to compositional data may lead to erroneous conclusions and spurious correlations. Therefore, compositional data (CoDa) should be treated taking into account their own mathematical structure. The so-called log-ratio approach provides a set of transformations that allow to apply conventional statistical techniques to the transformed compositional data samples. Thus, the objective of this paper is double. As a first stage it aims to introduce the CoDa formalism and highlight its potentiality in the port container throughput analysis as example of transport system providing an applied example: the container throughput evolution in the Spanish Mediterranean Ports system during the period 1976–2015. Second, based on the previous analysis, the aim is to characterize the container throughput in SpanishMed ports and its temporal evolution. The CoDa analysis clarifies the interpretation and data association of the container traffic throughput evolution in function of some selected change points: boom of containerization in 1990s and 2008 crisis. This contribution proves that the CoDa methodology is useful to investigate the complexity of the transport disciplines in order to understand and to manage the spatial integration that results from the movement of people and freight.Peer ReviewedPostprint (published version

    A compositional analysis approach assessing the spatial distribution of trees in Guadalajara, Mexico

    Get PDF
    Urban green infrastructure such as parks, gardens and trees, provide several ecosystem services and benefits. Particularly trees provide a broad amount of services in urban areas, such as improving air quality, mitigating carbon pollution and heat-island effect, attenuating storm-water floods, reducing noise and serving as habitat for different species among others. Likewise, urban trees provide different social (i.e., social cohesion), economic (i.e., increase in property value), psychological (i.e., stress reduction) and medical (i.e., increase in longevity of life) benefits (Landry, 2009; Roy et al., 2012; Battisti et al., 2019). Although it is well documented that trees are essential for the well-being and health of urban areas and their inhabitants, trees are not evenly distributed in urban areas. Previous studies have found that urban residents with a deprived socioeconomic status are associated with low coverage of urban trees in their communities (Hernández and Villaseñor, 2017; Park and Kwan, 2017; Wang and Qiu, 2018). Therefore, environmental justice seeks to ensure that green infrastructure and its benefits are distributed equally throughout the territory (Anguelovski, 2013; Gould and Lewis, 2017). The objective of this study is to determine whether the distribution of urban trees in the city of Guadalajara, Mexico is distributed equally or not among its colonies and urban districts. The information is obtained from the first and only tree census conducted in the city on June 2018 and treated with geographic information systems (GIS). The attributes of the tree dataset include their location (urban blocks, streets, parks and gardens), heights and diameters of their canopy (Government of Guadalajara, 2019). For the analysis and due to the compositional nature of the data, compositional analysis techniques are applied (see Aitchison, 1986; Pawlowsky-Glahn, et al., 2015; Filzmoser et al., 2018). With this novel approach, we contribute to the existing literature. Additionally, Principal Component Analysis (PCA) and cluster analysis are performed to identify the distribution of trees in the city. Likewise, to observe the relationship between trees and socio-economic variables, a multivariable linear regression is carried out respecting the compositional nature of the data. The results from PCA and cluster analysis show a clear differentiation in the distribution of trees between the East-West of the city, mainly in the compositions with respect to their height and diameter. Likewise, from the multivariate linear regression, considerable significance (p<0.05) is found in socio-economic variablesPeer ReviewedPostprint (author's final draft

    A compositional approach for modelling SDG7 indicators: case study applied to electricity access

    Get PDF
    Monitoring energy indicators has acquired a renewed interest with the 2030 Agenda for Sustainable Development, and specifically with goal 7 (SDG7), which seeks to guarantee universal access to energy. The predominant criteria to monitor SDG7 are given in a set of individual indicators. Along this line, the UN indicators proposed in the 47th session of the UN Statistical commission are a practical starting point. A relevant characteristic of these indicators is that they can be expressed as proportions from a whole, i.e., they are compositions. Notably, directly implementing traditional multivariate models onto indicators that are proportions without an intermediate process can lead to spurious analysis. Here, we aim to assess the application of compositional data analysis(CoDa) to follow up on the temporal trend indicators of the energy sector in the context of SDG7, with a case study for the most affected areas addressing the problem of electricity access. Following CoDa methodology, we first use a log-ratio transformation to bring compositions to real space and then apply three multivariate methods: linear regression, generalized additive models and support vector machine. We also address other characteristic problems of the electricity access indicators, such as data quality, which was treated by considering models with interactions. In sum, CoDa facilitates a controlled management of the parts that make up population based indicators, suggesting that modelling evolution of compositions as individual components – even the standard splitting of country data into rural and urban “access to” indicator –should be avoided.This research has been partially funded by the Ministerio de Economía y Competitividad del Gobierno de España (MINECO/FEDER, Ref: MTM2015-65016-C2-2-R); and by the Agència de Gestió d′Ajuts Universitaris i de Recerca de la Generalitat de Catalunya (Ref. 2017 SGR 656 and 2017 SGR 1496).Peer ReviewedPostprint (author's final draft

    Tree ecosystem services, for everyone? A compositional analysis approach to assess the distribution of urban trees as an indicator of environmental justice

    Get PDF
    Trees provide a broad amount of ecosystem services in urban areas. Although it is well documented that trees are essential for the well-being and livability of cities, trees are often not evenly distributed. Studies have found that urban residents with a deprived socioeconomic status are associated with a lower coverage and access to urban trees in their communities, yet a fair distribution of trees contributes to the sustainability and resilience of cities. In this context, the environmental justice movement seeks to ensure equal distribution of green infrastructure and its benefits throughout a territory. The objective of this study is threefold: (i) to determine whether urban trees in Guadalajara, Mexico, are distributed equally; (ii) to assess the association between urban trees and socioeconomic status; and (iii) to introduce compositional data analysis to the existing literature. Due to the compositional nature of the data, compositional analysis techniques are applied. We believe this novel approach will help define the proper management of data used in the literature. The outcomes provide insights for urban planners working towards the Sustainable Development Goals to help eradicate the uneven distribution of urban trees in cities.Peer ReviewedPostprint (published version
    corecore